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Abstract. Effective static analyses have been proposed which infer bounds 
on the number of resolutions or reductions. These have the advantage 
of being independent from the platform on which the programs are ex- 
ecuted and have been shown to be useful in a number of applications, 
such as granularity control in parallel execution. On the other hand, in 
distributed computation scenarios where platforms with different capa- 
bilities come into play, it is necessary to express costs in metrics that 
include the characteristics of the platform. In particular, it is specially 
interesting to be able to infer upper and lower bounds on actual execu- 
tion times. With this objective in mind, we propose an approach which 
combines compile-time analysis for cost bounds with a one-time profiling 
of the platform in order to determine the values of certain parameters for 
a given platform. These parameters calibrate a cost model which, from 
then on, is able to compute statically time bound functions for proce- 
dures and to predict with a significant degree of accuracy the execution 
times of such procedures in the given platform. The approach has been 
implemented and integrated in the CiaoPP system. 

Keywords: Execution Time Estimation, Cost Analysis, Profiling, Re- 
source Awareness, Cost Models, Mobile Computing. 



1 Introduction 

Predicting statically the running time of programs has many applications rang- 
ing from task scheduling in parallel execution to proving the ability of a pro- 
gram to meet strict time constraints in real-time systems. A starting point in 
order to attack this problem is to infer the computational complexity of such 
programs. This is one of the reasons why the development of static analysis 
techniques for inferring cost-related properties of programs has received con- 
siderable attention. However, in most cases such cost properties are expressed 
using platform-independent metrics. For example, [4,5] present a method for 
automatically inferring functions which capture an upper bound on the number 
of resolution steps or reductions that a procedure will execute as a function of 
the size of its input data. In [10, 11] the method of [4, 10] was fully automated in 
the context of a practical compiler and in [6, 10] a similar approach was applied 
in order to also obtain lower bounds, which are specially relevant in parallel 



execution. Such platform-independent cost information (bounds on number of 
reductions) has been shown to be quite useful in various applications. This in- 
cludes, for example, scheduling parallel tasks [8,10,11]. In a typical scenario, 
these tasks will be executed in a single parallel machine, where all processors are 
typically identical. Therefore, the deduced number of reductions can actually be 
used as a relative measure in order to compare to a first degree of approximation 
the amount of work under the tasks. 

However, in distributed execution and other mobile/pervasive computation 
scenarios, where different platforms come into play with each platform having 
different computing power, it becomes necessary to express costs in metrics that 
can be later instantiated to different architectures so that actual running time 
can be compared using the same units. This applies also to heterogeneous par- 
allel computing platforms. With this objective in mind, we present a framework 
which combines cost analysis with profiling techniques in order to infer func- 
tions which yield bounds on platform-dependent execution times of procedures. 
Platform-independent cost functions are first inferred which are parameterized 
by certain constants. These constants aim at capturing the execution time of 
certain low-level operations on each platform. For each execution platform, the 
value of such constants is determined experimentally once and for all by running 
a set of synthetic benchmarks and measuring their running times with a profiling 
toolkit that we have also developed. Once these constants are determined, they 
are fed into the model with the objective of predicting with a certain accuracy 
execution times. We have studied a relatively large number of cost models, in- 
volving different sets of constants in order to explore experimentally which of the 
models produces the most precise results, i.e., which parameters model and pre- 
dict best the actual execution times of procedures. In doing this we have taken 
into account the trade-off between simplicity of the cost models (which implies 
efficiency of the cost analysis and also simpler profiling) and the precision of 
their results. With this aim, we have started with a simple model and explored 
several possible refinements. 

In addition to cost analysis, the implementation of profilers in declarative lan- 
guages has also been considered by various authors, with the aim of helping to 
discover why a part of a program docs not exhibit the expected performance. De- 
bray [3] showed the basic considerations to have in mind when profiling Prolog 
programs: handling backtracking and failure. Ducasse [7] designed and imple- 
mented a trace analyzer for Prolog which can be applied to profiling. Sansom 
and Peyton Jones [13] focused on profiling of functional languages using a seman- 
tic approach and highlighted the difficulty in profiling such kind of languages. 
Jarvis and Morgan [12] showed how to profile lazy functional programs. Brasscl 
et al. [1] solved part of the difficulty in profiling when considering special features 
in functional logic programs, like sharing, laziness and non-determinism. We will 
use also profiling but, since our aim is to predict performance, profiling will in 
our case be aimed at calibrating the values for some constants that appear in 
the cost functions, and which will be instrumental to forecast execution times 
for a given platform and cost model. Therefore we will not use profiling with just 
some fixed input arguments, but with a set of programs and input arguments 
which we hope will be representative enough to derive meaningful characteristics 
of an execution platform. 



2 Static Platform-Dependent Cost Analysis 



In this Section we present the compile-time cost bounds analysis component of 
our combined framework. This analysis has been implemented and integrated 
in CiaoPP [9] by extending previous implementations of reduction-counting cost 
analyses. The inferred (upper or lower) bounds on cost are expressed as functions 
on the sizes of the input arguments and use several platform-dependent param- 
eters. Once these parameters are instantiated with values for a given platform, 
such functions yield bounds on the execution times required by the computation 
on such platform. The analyzer can use several metrics for computing the "size" 
of an input, such as list-length, term-size, term-depth, integer-value, etc. Types, 
modes, and size measures are first automatically inferred by other analyzers 
which are part of CiaoPP and then used in the size and cost analysis. 

2.1 Platform-Independent Static Cost Analysis 

As mentioned before, our static cost analysis approach is based on that developed 
in [4, 5] (for estimation of upper bounds on resolution steps) and further extended 
in [6] (for lower bounds). In these approaches the time complexity of a clause can 
be bounded by the time complexity of head unification together with the time 
complexity of each of its body literals. For simplicity, the discussion that follows 
is focused on the estimation of upper bounds. We refer the reader to [6] for 
details on lower bounds analysis. Consider a clause C defined as "H : — Li, ...,L m ". 
Because of backtracking, the number of times a literal will be executed depends 
on the number of solutions that the literals preceding it can generate. Assume 
that n is a vector such that each element corresponds to the size of an input 
argument to clause C and that each n,, i = 1 . . . m, is a vector such that each 
element corresponds to the size of an input argument to literal Lj, r is the cost 
needed to resolve the head H of the clause with the literal being solved, and 
Sols Lj is the number of solutions literal Lj can generate. Then, an upper bound 
on the cost of clause C (assuming all solutions are required), Cost c (7i), can be 
expressed as: 



Here we use j ~< i to denote that Lj precedes Lj in the literal dependency graph 
for the clause. 

Our current implementation also considers the cost of the terms created 
for the literals in the body of predicates, which can affect the cost expression 
significantly. To further simplify the discussion that follows, we restrict ourselves 
to the simple case where each literal is determinate, i.e., produces at most one 
solution. In this case, equation (1) simplifies to: 



in 



Costc(n) < r + ^(J^Sols Lj (n :) ))Cost Li (n i ), 



(1) 
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(However, it is important to note that our implementation is not limited to 
deterministic programs: our system handles non determinism, i.e., presence of 
several solutions for a given call, in the cost analysis). 

A difference equation is set up for each recursive clause, whose solution (using 
as boundary conditions the cost of non-recursive clauses) is a function that yields 
the cost of a clause. The cost of a predicate is then computed from the cost of 
its defining clauses. Since the number of solutions generated by a predicate that 
will be demanded is generally not known in advance, a conservative upper bound 
on the computational cost of a predicate can be obtained by assuming that all 
solutions are needed, and that all clauses are executed (thus the cost of the 
predicate is assumed to be the sum of the costs of its defining clauses). Taking 
mutual exclusion into account in order to obtain a more precise estimate of the 
cost of a predicate is relatively easy: the complexity for deterministic predicates 
can be approximated with the maximum of the costs of mutually exclusive groups 
of clauses. 

The analysis in [4, 5] was primarily aimed at estimating resolution steps. 
However, the basic metric is open and can be tailored to alternative scenarios: 
more sophisticated, accurate measures can be used instead of the initially pro- 
posed ones (e.g., number of basic unifications). In the rest of this section we 
explore this open issue more deeply and study how the original cost analysis can 
be extended in order to infer cost functions using more refined and parametric 
cost models, which in turn will allow achieving accurate execution time bound 
analysis. 

2.2 Proposed Platform-Dependent Cost Analysis Models 

Since the cost metric which we want to use in our approach is execution time, we 
take t (in expression 2) to include the time needed to resolve the head H of the 
clause with the literal being solved, the cost associated with the resolution of the 
clause, and the cost coming from setting up the body literals for execution. In the 
following, we will refer to r as the clause head cost function, under the assumption 
that these other costs are also taken into account. We will consider different 
values for t, each of them yielding a different cost model. These cost models 
make use of a vector of platform-dependent constants, together with a vector of 
platform-independent metrics, each one corresponding to a particular low-level 
operation related to program execution. Examples of such low-level operations 
considered by the cost models are unifications where one of the terms being 
unified is a variable and thus behave as an "assignment", or full unifications, i.e., 
when both terms being unified are not variables, and thus unification performs 
a "test" or produces new terms, etc. Thus, we assume that r is a function 
parameterized by the cost model, so that: 

t{Q) = time(Q) (3) 

where time(n) is a function that gives the time needed to resolve the head H of 
the clause with the literal being solved (plus some possible costs associated to the 
execution of the clause such as, e.g., whether an activation record is allocated) 
for the cost model named fi. We study a family of cost models such that time(O) 
is a function defined as follows: 



time(£2) = time(u>i) + ■ • • + time(u> v ), v > 



(4) 



where each time(u>i) provides that part of the execution time which depends on 
the metric uji. We assume that: 

time{u)i) = K Ui x I(uJi) (5) 

where K UJi is a platform-dependent constant, and I(u>i) is a platform-independent 
cost function. 

Since time{Q) is a linear combination of platform-independent cost functions, 
we can write equation (4) as: 

time(Q) = K n • I{Q) (6) 

where Kq is a vector of platform-dependent constants, 1(JT) is a vector of 
platform- independent cost functions, and • is the dot product. 

Accordingly, we generalize the definition of equation (2) introducing the 
clause head cost function r as a parameter: 

m 

Cost c (r,n) < r + ^Cost L< (ni). (7) 

i=l 

A particular definition of yields a cost model. We have tried several cost 
models, by using different vectors I{fi) constructed by choosing some (or all) 
of the following I(u>i) cost functions (for example, the cost model that uses all 
such functions is I(f2) = (I(step), I{viunif), I{vounif), I(giunif), I(gounif))). 
In the following an input argument is one for which the term being passed by 
the calling literal is known to be non-var at the time of head unification. An 
output argument is one for which the term being passed by the calling literal is 
known to be a variable at the time of head unification. Whether unifications are 
input or output can be inferred using well-known techniques for mode analyses 
(in our case, those provided by CiaoPP). 

- I {step) = 1. 

Here we assume that there is a constant component of the execution time 
when a clause is resolved (a clause neck is crossed). I.e., following 

equation (5), we are assuming for this component that: 

time(step) = K ste p 

— I(vounif) = the number of variables in the clause head which correspond to 
"output" argument positions. 

Here we assume that there is a component of the execution time that is di- 
rectly proportional to the number of cases where we know that both terms 
being unified are variables and thus unification really implies a simple as- 
signment with a (presumably small) constant cost: 

time{vounif) = K V ounif x I(vounif) 



— I(viunif) = the number of variables in the clause head which correspond to 
"input" argument positions. 

Here we assume that there is a component of the execution time that is di- 
rectly proportional to the number of cases where we know that the incoming 
term is non-var and the argument position in the clause is a variable. In 
this case the head unification for that argument is also an assignment with 
a small, constant cost, and there is also a cost associated with creating the 
input argument at the calling point, which for simplicity we will also consider 
constant. Given these assumptions: 

time(yiunif) = K V i un if x I(viunif) 

— I(gounif) = The number of function symbols, constants, and variables in 
the clause head which appear in output arguments. 

We are assuming that there is a component of the execution time that is 
directly proportional to the size of the terms that have to be written into 
variables passed in by the calling literal, and which is proportional to the 
number of function symbols, constants, and variables which appear in output 
arguments in the clause head: 

time(gounif) = K goumf x I(gounif) 

— I(giunif) = The number of function symbols, variables, and constants in 
the clause head which appear in input arguments. 

Here we arc assuming that there is a component of the execution time that 
is directly proportional to the number of "input" unifications, i.e., when 
both terms being unified are not variables, and thus unification performs a 
"test," and which is actually proportional to the number of function symbols, 
variables, and constants in the clause head which appear in input arguments 
(this is obviously an approximation): 

time(giunif) = K giuni f x I(giunif) 

— I(nargs) = arity(H). 

Here we arc assuming that there is a component of the execution time that 
depends on the number of arguments in the clause head: 

time{nargs) = K nargs x arity(H) (8) 

This component is obviously redundant with respect to the previous ones, 
but we have included it as a statistical control: the experiments should show 
(and do show) that it is irrelevant when the others arc used. 



Clearly, other components can be included (such as whether activation records 
are created or not) but our objective is to see how far we can go with the com- 
ponents outlined above. 

We adopt the same approach as [5, 6] for computing bounds on cost of pred- 
icates from the computed values for the cost of the clauses defining it. However, 
we introduce the clause head cost function r as a parameter of these cost func- 
tions. 



Let Cost p (r, n) be a function which gives the cost of the computation of a 
call to predicate p for an input of size n (recall that the cost units depend on the 
definition of r). Given a predicate p, and a clause head cost function time(f2) 
of the form defined in equation (6), we have that: 



Cost p (iirae(J?),n) = K n • Cost p (7(/2), ra) (9) 

where Kg, 7(J2) and Cost p (7(J2), n) are vectors of the form: 
Kg = (K Ul , . . . , K Uv ) , 
I{Q) = (I(uj),...,I(w v )), and 

Cost p (7(i?),n) = (Cost p (7(wi), n), . . . , Cost p (7(cj„), n)) 

Equation (9) gives the basis for computing values for constants K UJi via pro- 
filing (as explained in Section 4). Also, it provides a way to obtain the cost of 
a procedure expressed in a platform-dependent cost metric from another cost 
expressed in a platform-independent cost metric. 

3 Refining the Cost Model: Dealing with Builtins 

In this section we present our approach to the cost analysis of programs which 
call builtins, or more generally, predicates whose code is not available to the 
analyzer (external predicates) . We will refer to all of them as builtins for brevity 
We assume that there is a cost function (expressed via trust assertions [9]) for 
builtin predicates. In some cases, this cost function for each builtin predicate 
is approximated by a constant value, and in others, it is approximated by a 
function that depends on properties of the (input) arguments of the predicate. 
In particular, the cost of arithmetic builtin predicates (such as =:=/2, =\=/2, 
or >/2) is approximated by a function that depends on the number and type of 
arithmetic operands appearing in the arithmetic expressions that can be passed 
to such predicates as arguments. 

Note that this is an important improvement over the cost analysis proposed 
in [5] (which infers number of resolution steps), since one of the assumptions 
made in such analysis is that calls to certain builtin predicates are not counted 
as a resolution step, and are thus completely ignored by cost analysis. This 
assumption is not realistic if we want to estimate execution times, since the cost 
of executing such builtins has to be taken into account. 

Going into more detail, we assume that each builtin contributes with a new 
component to the execution time as expressed in Equation (4), that is, our 
cost model will have a new component timeiyif) for each builtin predicate and 
arithmetic operator. Let O/n be an arithmetic operator. The execution time due 
to the total number of times that such operator is evaluated is given by: 

time(®/n) = K Q / n x 7(©/n) 

where Kq/ u is a platform-dependent constant, and 7(©/n) is a platform-independent 
cost function. Kq/„ approximates the cost (in units of time) of evaluating the 
arithmetic operator Q/n. I (Q/n) could be the number of times that the arith- 
metic operator is evaluated. Alternatively, it can be a cost function defined as: 



/(0/n) = EvCost(Q/n,a) 



and where S is the set of arithmetic expressions appearing in the clause body 
which will be evaluated; and EvCost(©/n, a) represents the cost corresponding 
to the operator 0/n in the evaluation of the arithmetic term a, i.e.: 



EvCost(©/n, A) 



if A is a constant 

or a variable 

n 

1 + £ EvCost(0/n, AA if A = 0(Ai, A n ) 

i=l 

rn 

J2 EvCost(0/n, A;) if Aj- Q(A u ...,A n ) 

i=l 

A A = Q(A 1 ,...,A m ) 
for some operator 0/m 



For simplicity, we assume that the cost of evaluating the arithmetic term t 
to which a variable appearing in A will be bound at execution time is zero (i.e., 
we ignore the cost of evaluating t) . This is a good approximation if in most cases 
t is a number and thus no evaluation is needed for it. However, a more refined 
cost model could assume that this cost is a function on the size of t. 

Note that this model ignores the possible optimizations that the compiler 
might perform. We can take into account those performed by source-to-source 
transformation by placing our analyses in the last stage of the front-end, but at 
some point the language the compiler works with would be different enough as 
to require different considerations in the cost model. 

However, experimental results show that our simplified cost model gives a 
good approximation of the execution times for arithmetic builtin predicates. 
With these assumptions, equation (9) (in Section 2.2) also holds for programs 
that perform calls to builtin predicates, say, for example, a builtin b/n, by in- 
troducing b/n and 0/n as new cost components of fl. 

A similar approach can be used for other (non-arithmetic) builtins b/n using 
the formula: 



time(b/n) = Kt,/ n x I(b/n) 



4 Calibrating Constants via Profiling 

In order to compute values for the platform-dependent constants which appear 
in the different cost models proposed in Section 2.2, our calibration schema takes 
advantage of the relationship between the platform-dependent and -independent 
cost metrics expressed in Equation (9). In this sense, the calibration of the 
constants appearing in K q is performed by solving systems of linear equations 
(in which such constants are treated as variables). 
Based on this expression, the calibration procedure consists of: 



1. Using a selected set of calibration programs which aim at isolating specific 
aspects that affect execution time of programs in general. For these calibra- 
tion programs it holds that Cost p (I(uii),n) is known for all 1 < i < v. This 
can be done by using any of the following methods: 

— The analyzers integrated in the CiaoPP system infer the exact cost func- 
tion, i.e., Cost p l (I (uii), n) = Cost p u (/(wi),7i) = Costp(/(w,),n) , 

— Cost p (J(o;,),n) is computed by a profiler tool, or 

— Costp(7(a;,),n) is supplied by the user together with the code of program 
p (i.e., the cost function is not the result from any automatic analysis 
but rather p is well known and its cost function can be supplied in a 
trust assertion). 

2. For each benchmark p in this set, automatically generating a significant 
amount m of input data for it. This can be achieved by associating with 
each calibration program a data generation rule. 

3. For each generated input data dj, computing a pair (C Pj ,T Pj ), 1 < j < m, 
where: 

— T p . is the j-th observed execution time of program p with this generated 
input data. 

— C Pj = Cost p (/(i7), nj), where nj is the size of the j-th input data dj. 

4. Using the set of pairs (C Pj ,T Pj ) for setting up the equation: 

Pj *K n = T Pj (10) 

where Kq is considered a vector of variables. 

5. Setting up the (overdetermined) system of equations composed by putting 
together all the equations (10) corresponding to all the calibration programs. 

6. Solving the above system of equations using the least square method (see, 
e.g., [14]). A solution to this system gives values to the vector Kq and hence, 
to the constants K UJi which are the elements composing it. 

7. Calculating the constants for builtins and arithmetic operators by performing 
repeated tests in which only the builtin being tested is called, accumulating 
the time, and dividing the accumulated time by the number of times the 
repeated test has been performed. 

5 Assessment of the Calibration of Constants 

We have assessed both the constant calibration process and the prediction of 
execution times using the previously proposed cost models in two different plat- 
forms: 

- "intel" platform: Dell Optiplex, Pentium 4 (Hyper threading), 2GHz, 512MB 
RAM memory, Fedora Core 4 operating System with Kernel 2.6. 

- "ppc" platform: Apple iMac, PowerPC G4 (1.1) 1.5GHz, 1GB RAM memory, 
with Mac OS X 10.4.5 Tiger. 



Program 

Environments creation 

Predicates with no arguments 

Traverse a list without last call optimization 

Traverse a list with last call optimization 

Program for which I(viunif) is known 

Program for which I(vounif) is known 

Program (unifying deep terms) for which I(giunif) is known 
Program (unifying flat terms) for which I(giunif) is known 
Program for which I(gounif) is known 
Predicate with many arguments 



Table 1. Description of calibration programs used in the estimation of constants. 



In section 4 we presented equation 10, and we mentioned that it can be 
solved using the least squares method. We used the householder algorithm, which 
consists in decomposing the matrix C = {C Pj }, which has m rows and n columns 
into the product of two matrices Q and U (denoted • or without any symbol) 
such that C = Q • U, where Q is an orthonormal matrix (i.e., Q T • Q = I, 
the m x m identity matrix) and U an upper triangular m x n matrix. Then, 
multiplying both sides of the equation 10 by Q T and simplifying we can get: 

U»K = Q T »T = B 

where, for clarity, we denote K = Kq, T = T Pj and Q T • T = B. We can take 
advantage of the structure of U and define V as the first n rows of U, n being 
the number of columns of C and b the first n rows of B, then K can be estimated 
solving the following upper triangular system, where K stands for the estimate 
for K: 



V *K = Q T • T = b 

Since this method is being used to find an approximate solution, we define 
the residual of the system as the value 

R = T — CK 

Let 

RSS = R • R 

be the residual square sum, and let 

MRSS = 

m — n 

be the mean of residual square sum, where m and n are the number of rows and 
columns of the matrix C respectively, and finally let 

S = Vmrss 



be the estimation of the model standard error, S. In order to experimentally 
evaluate which models better approximate the observed time in practice, we 



Plat. 


Model 


S Qms) 


K n 


intel 


step nargs giunif gounif viunif vounif 
step giunif gounif viunif vounif 
step giunif gounif vounif 
step 


6.2475 
9.3715 
13.7277 
68.3088 


(21.27, 9.96, 10.30, 8.23, 6.46, 5.69) 
(26.56, 10.81, 8.60, 6.17, 6.39) 
(27.95, 11.09, 8.77, 7.40) 
108.90 


ppc 


step nargs giunif gounif viunif vounif 
step giunif gounif viunif vounif 
step giunif gounif vounif 
step 


4.7167 
5.9676 
16.4511 
116.0289 


(41.06, 5.21, 16.85, 15.14, 9.58, 9.92) 
(43.83, 17.12, 15.33, 9.43, 10.29) 
(45.95, 17.55, 15.59, 11.82) 
183.83 



Table 2. Global values for vector constants in several cost models (in nanoseconds), 
sorted by S, the standard error of the model. 



have compared the values of MRSS (or S) for several proposed models. Table 2 
shows the estimated values for the vector K using the calibration programs in 
Table 1, as well as the standard error of the model, sorted from the best to the 
worst model. For example, the first row in the table shows the model that has 
as components step, nargs, giunif, gounif, viunif, vounif for the intel platform. 
It has a standard error of 6.2475 /is and the values for each of the constants are 
21.27, 9.96, 10.30, 8.23, 6.46, and 5.69 nanoseconds, respectively. 

Note that the estimation of K is done just once per platform. In the case of 
the intel platform it took 15.62 seconds and in ppc 17.84 seconds, repeating the 
experiment 250 times for each program. 

6 Assessment of the Prediction of Execution Times 

We have tested the implementation of the proposed cost models in order to 
assess how well they predict the execution time of other programs (not used 
in the calibration process) statically, without performing any runtime profiling 
with them. We have performed experiments with all of the 63 possible cost 
models that result of the combination of one or more of the components de- 
scribed in Section 2.2. However, for space reasons and for clarity, we only show 
the three most accurate cost models (according to a global accuracy compar- 
ison that will be presented later) plus the step model, which has special in- 
terest as we will also see later. Experimental results are shown in Table 3. 
Prog, lists the program names. The analyzers integrated in the CiaoPP sys- 
tem infer the exact cost function for all the programs in that table under the 
I(tUi) metric, which means that the upper and lower bound are the same, i.e., 
Cost p i (/(u;i),7l) = Cost p u (/(u;i), n) = Cost p (/((jJi), n). There are several rows 
for each program in the table. The first three rows show results corresponding 
to the prediction of execution times with the three more accurate cost models. 
The fourth row shows the prediction obtained by the cost model step that only 
considers resolution steps, i.e., it assumes that the execution time of a proce- 
dure call is directly proportional to the number of resolution steps performed 
by the call. This means that for this simple cost model we are assuming that 
time(step) = K step , since I(step) = 1, for a constant K step , which represents the 
time taken by a resolution step. Note that Cost c {I (step), n) gives the number 



Prog. 


Model 


intel 


ppc 


Estimate 


Estimate 


(lis) (%) 


(fis) (%) 


evpol 


step nargs giunif gounif viunif vounif 
step giunif gounif viunif vounif 
step giunif gounif vounif 
step 


89.72 (44) 
85.06 (38) 
82 (35) 
90.12 (45) 


77.4 (23) 
74.96 (26) 
70.28 (33) 
85.07 (13) 


Observed 


58.43 


97.08 


Analysis time T ca (s) 


2.002 


4.461 


hanoi 


step nargs giunif gounif viunif vounif 
step giunif gounif viunif vounif 
step giunif gounif vounif 
step 


319 (31) 
243.3 (3) 

205.6 (14) 

340.7 (38) 


398.5 (4) 
358.8 (7) 
301.3 (25) 

538.6 (34) 


Observed 


235.3 


384.2 


Analysis time T ca (s) 


2.145 


4.903 


nrev 


step nargs giunif gounif viunif vounif 
step giunif gounif viunif vounif 
step giunif gounif vounif 
step 


131.3 (68) 
101.1 (39) 
82.51 (18) 

144.4 (80) 


179.4 (26) 
163.6 (16) 
135.2 (3) 
243.8 (59) 


Observed 


69.25 


139.2 


Analysis time T ca (s) 


2.022 


4.691 


palind 


step nargs giunif gounif viunif vounif 
step giunif gounif viunif vounif 
step giunif gounif vounif 
step 


131.8 (18) 
101 (9) 
86.91 (24) 
167.2 (43) 


179.8 (5) 
163.7 (5) 

142.1 (19) 

282.2 (52) 


Observed 


110 


171.6 


Analysis time T ca (s) 


2 


4.7 


powset 


step nargs giunif gounif viunif vounif 
step giunif gounif viunif vounif 
step giunif gounif vounif 
step 


537.5 (59) 
404.5 (28) 
323.8 (5) 
448.7 (38) 


727.9 (17) 

658.3 (7) 
534.9 (14) 

757.4 (21) 


Observed 


308.2 


615 


Analysis time T ca (s) 


2.07 


4.636 


append 


step nargs giunif gounif viunif vounif 
step giunif gounif viunif vounif 
step giunif gounif vounif 
step 


50.29 (75) 
38.69 (44) 
31.36 (22) 
54.56 (85) 


68.72 (24) 
62.65 (15) 
51.45 (5) 
92.1 (56) 


Observed 


25.16 


53.92 


Analysis time T ca (s) 


1.932 


4.441 



Table 3. Evaluation of execution time predictions. 



of resolution steps performed by clause C. The last row per benchmark program 
presents the observed execution times (i.e., measured execution times) and allows 
measuring the accuracy of the different predictions. In this sense, values in the 
Model column are the names of the four cost models. The value observed iden- 
tifies the row corresponding to the observed values. The following two columns 
show results corresponding to the "intel" and "ppc" execution platforms. 



Platform 


Model 


Error (%) 


intel 


step nargs giunif gounif viunif vounif 
step giunif gounif viunif vounif 
step giunif gounif vounif 
step 


53.17 
31.06 
21.48 
58.45 


ppc 


step nargs giunif gounif viunif vounif 
step giunif gounif viunif vounif 
step giunif gounif vounif 
step 


18.72 
14.66 
19.44 
43.04 



Table 4. Global comparison of the accuracy of cost models. 



Column Estimate shows execution times computed by using the average 
value of the constant Kq as estimated in Table 2: 

Estimate = Kq • Cost p ( J(i?),n) 

Deviations respect to the observed values (in the observed row) are also shown 
between parenthesis in the column Estimate. 

The observed execution times have been measured by running the programs 
with input data of a fixed size. 10 input data sets of such fixed size have been 
generated randomly. 5 runs of the program have been performed for each such in- 
put data set. The observed execution time for such input size has been computed 
as the average of all runs. 

Row T ca shows the total (static) cost analysis time (in seconds) needed to 
perform the execution time estimation (and includes mode, type, and cost anal- 
ysis). 

Table 4 compares the overall accuracy of the four cost models already shown 
in Table 3, for the two considered platforms. The last column shows the global 
error and it is an indicator of the amount of deviation of the execution times 
estimated by each cost model with respect to the observed values. As global 
error we take the square mean of the errors in each example being considered 
in Table 3. By considering both platforms in combination we can conclude that 
the more accurate cost model is the one consisting of steps, giunif, gounif, vi- 
unif, and vounif. This cost model has an overall error of 14.66 % in platform 
"PPC" and 31.06 % in "Intel". In "Intel" (obviously a more challenging plat- 
form) the model consisting of steps, giunif, gounif, and vounif appears to be the 
best. This coincides with our intuition that taking into account a comparatively 
large number of lower-level operations should improve accuracy. However, such 
components should contribute significantly to the model in order to avoid noise 
introduction. It is also interesting to see that including nargs in the cost model 
does not further improve accuracy, as expected, since nargs is not independent 
from the four components giunif, gounif, viunif, vounif. In fact, including this 
component results in a less precise model in both platforms, due to the noise in- 
troduced in the model. Also, the cost model step deserves special mention, since 
it is the simplest one and, at least for the given examples, the error is smaller 
than we expected and better than more complex cost models not shown in the 
tables. 



Overall we believe that the results are very encouraging in the sense that 
our combined framework predicts with an acceptable degree of accuracy the 
execution times of programs and paves the way for even more accurate analyses 
by including additional parameters. 

7 Applications 

The experimental results presented in Section 6 show that the proposed frame- 
work can be relevant in practice for estimating platform dependent cost metrics 
such as execution time. We believe that execution time estimates can be very use- 
ful in several contexts. As already mentioned, in certain mobile/pervasive com- 
putation scenarios different platforms come into play, with each platform having 
different capabilities. More concretely, the execution time estimates could be 
useful for performing resource/granularity control in parallel/distributed com- 
puting. This belief is based on previous experimental results, where it appeared 
from the sensitivity of the results observed in such experiments, that while it is 
not essential to be absolutely precise in inferring the best time estimates for a 
query, the number of reductions by itself was a rough measure and the current 
time estimation approach could presumably improve on previous results. 

One of the good features of our approach is that we can translate platform- 
independent cost functions (which are the result of the analyzer) into platform- 
dependent cost functions (using the relationship in expression (9)). A possible 
application for taking advantage of this feature is mobile code safety and in 
particular Proof-Carrying Code (PCC), a general approach in which the code 
supplier augments the program with a certificate (or proof). Consider a scenario 
where the producer sends a certificate with a platform-independent cost function 
(i.e., where the cost is expressed in a platform- independent metric) together with 
a calibration program. The calibration program includes a fixed set of calibration 
benchmarks. Then, the consumer runs (only once) the calibration program and 
computes the values for the constants appearing in the cost functions. Using 
these constants, the consumer can obtain platform-dependent cost functions [8]. 

Another application of the proposed approach is resource-oriented special- 
ization. The proposed cost-models, which include low-level factors for CLP pro- 
grams, are more refined cost-models than previously proposed ones and thus can 
be used to better guide the specialization process. The inferred cost functions 
can be used to develop automatic program transformation techniques which take 
into account the size of the resulting program, its run time and memory usage, 
and other low-level implementation factors. In particular, they can be used for 
performing self-tuning specialization in order to compare different specialized 
version according to their costs [2]. 

8 Conclusions 

We have developed a framework which allows estimating execution times of 
procedures of a program in a given execution platform. The method proposed 
combines compilc-time (static) cost analysis with a one-time profiling of the 
platform in order to determine the values of certain constants. These constants 



calibrate a cost model from which time cost functions for a given platform can 
be computed statically. The approach has been implemented and integrated in 
the CiaoPP system. To the best of our knowledge, this is the first combined 
framework for estimating statically and accurately execution time bounds based 
on static automatic inference of upper and lower bound complexity functions 
plus experimental adjustment of constants. We have performed an experimen- 
tal assessment of this implementation for a wide range of different candidate 
cost models and two execution platforms. The results achieved show that the 
combined framework predicts the execution times of programs with a reason- 
able degree of accuracy. We believe this is an encouraging result, since using a 
one-time profiling for estimating execution times of other, unrelated programs 
is clearly a challenging goal. 

Also, we argue that the work presented in this paper presents an interesting 
trade-off between accuracy and simplicity of the approach. At the same time, 
there is clearly room for improving precision by using more refined cost models 
which take into account additional (lower level) factors. Of course, these models 
would also be more difficult to handle since on one hand they would require 
computing more constants and on the other hand they may require taking into 
account factors which are not observable at source level. This is in any case the 
subject of possibly interesting future work. 
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